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Semi-supervised support vector machines (S3VMs) are a kind of popular approaches which try to improve 
rh ■ learning performance by exploiting unlabeled data. Though S3VMs have been found helpful in many 

situations, they may degenerate performance and the resultant generalization ability may be even worse 



C/2 _ 

, O, ■ than using the labeled data only. In this paper, we try to reduce the chance of performance degeneration 

^ I of S3VMs. Our basic idea is that, rather than exploiting all unlabeled data, the unlabeled instances should 

ly-v ' be selected such that only the ones which are very likely to be helpful are exploited, while some highly 

ly-v I risky unlabeled instances are avoided. We propose the S3VM-ms method by using hierarchical clustering 

to select the unlabeled instances. Experiments on a broad range of data sets over eighty-eight different 
O , settings show that the chance of performance degeneration of S3VM-ms is much smaller than that of 

existing S3VMs. 
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1. Introduction 

In many real situations there are plentiful unlabeled training data while the acquisition of class labels is 
costly and difficult. Semi-supervised learning tries to exploit unlabeled data to help improve learning 
performance, particularly when there are limited labeled training examples. During the past decade, 
semi-supervised learning has received significant attention and many approaches have been developed 
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Among the 



chines) 



la 



many semi-supervised learning approaches, S3VMs (semi-supervised support vector ma- 



la] are popular and have solid theoretical foundation. However, though the performances 



of S3VMs are promising in many tasks, it has been found that there are cases where, by using unlabeled 
data, the performances of S3VMs are even worse than SVMs simply using the labeled data i25l l6ll7ll. 
To enable S3VMs to be accepted by more users in more application areas, it is desirable to reduce the 
chances of performance degeneration by using unlabeled data. 

In this paper, we focus on transductive learning and present the S3VM-M5 (S3VM with Unlabeled in- 
stances Selection) method. Our basic idea is that, given a set of unlabeled data, it may be not adequate 
to use all of them without any sanity check; instead, it may be better to use only the unlabeled instances 
which are very likely to be helpful while avoiding unlabeled instances which are with high risk. To exclude 
highly risky unlabeled instances, we first introduce two baselines, where the first baseline uses standard 
clustering technique motivated by the discemibility of density set Jllu while the other one uses label prop- 
agation technique motivated by confidence estimation. Then, based on the analysis of the deficiencies of 
the two baseline approaches, we propose the S3VM-ms method, which employs hierarchical clustering to 
help select unlabeled instances. Comprehensive experiments on a broad range of data sets over eighty- 
eight different settings show that, the chance of performance degeneration of S3YM-us is much smaller 



than that of TSVM jlSll . while the overall performance of S3YM-us is competitive with TSVM. 



The rest of this paper is organized as follows. Section 2 briefly reviews some related work. Section 3 
introduces two baseline approaches. Section 4 presents our S3VM-m5^ method. Experimental results are 
reported in Section 5. The last section concludes this paper. 

2. Related Work 



Roughly speaking, existing semi-supervised learning approaches mainly fall into four categories. The first 

dm, 



category is generative methods, e.g., H 



which extend supervised generative models by exploiting 



unlabeled data in parameter estimation and labe 



second category is graph-based methods 



, e.g., uy 



estimation using techniques such as the EM method. The 



2611 ■ which encode both the labeled and unlabeled 



instances in a graph and then perform label propagation on the graph. The third category is disagreement- 



based methods 



, e.g., k 



27h . which employ multiple learners and improve the learners through labeling 



the unlabeled data based on the exploitation of disagreement among the learners. The fourth category 



3Lll5h. which use unlabeled data to regularize the decision boundary to go through low 



is S3VMs, e.g., 
density regions 

Though semi-supervised learning approaches have shown promising performances in many situations, it 
has been indicated by many authors that using unlabeled data may hurt the performance 11201 |25L IllL 1271 



m,m 



2 in . In some application areas, especially the ones which require high reliability, users might 
be reluctant to use semi-supervised learning approaches due to the worry of obtaining a performance 
worse than simply neglecting unlabeled data. As typical semi-supervised learning approaches, S3VMs 
also suffer from this deficiency. 



The usefulness of unlabeled data has been discussed theoretically jig. |2|, 



2111 and validated empirically 



Many literatures indicated that unlabeled data should be used carefully. For generative methods. 



Cozman et al. jllll showed that unlabeled data can increase error even in situations where additional 
labeled data would decrease error. One main conjecture on the performance degeneration is attributed to 
the difficulties of making a right model assumption which prevents the performance from degenerated by 
fitting with unlabeled data. For graph-based methods, more and more researchers recognize that graph 
construction is more crucial than how the labels are propagated, and some attempts have been devoted to 



using domain knowledge or constructing robust graphs 



1411 . As for disagreement-based method, the 



generalization ability has been studied with plentiful theoretical results based on different assumptions 



15, 

m 



m, 



23 



2411 . As for S3VMs, the correctness of the S3VM objective has been studied on small data sets 



It is noteworthy that though there are many work devoted to cope with the high complexity of S3VMs 
ilsL lid. lZL uM, there was no proposal on how to reduce the chance of performance degeneration by 
using unlabeled data. There was a relevant work which uses data editing techniques in semi-supervised 
learning JlTll ; however, it tries to remove or fix suspicious unlabeled data during training process, while 
our proposal tries to select unlabeled instances for S3VM and SVM predictions after the S3VM and SVM 
have already been trained. 



3. Two Baseline Approaches 



As mentioned, our intuition is to use only the unlabeled data which are very likely to help improve the 
performance and keep the unlabeled data which are with high risk to be unexploited. In this way, the 



chance of performance degeneration may be significantly reduced. Current S3VMs can be regarded as an 
extreme case which believes that all unlabeled data are with low risk and therefore all of them should be 
used; while inductive SVMs which use labeled data only can be regarded as another extreme case which 
believes that all the unlabeled data are high risky and therefore only labeled data are used. 

Specifically, we consider the following problem: Once we have obtained the predictions of inductive S VM 
and S3VM, how to remove risky predictions of S3VM such that the resultant performance could be often 
better and rarely worse than that of inductive S VM? 

There are two simple ideas that are easy to be worked out to address the above problem, leading to two 
baseline approaches, namely S3VM-C and SSVM-p. 

In the sequel, suppose we are given a training data set 2? = C[jU where C = {(xi,yi), . . . , (x;,y;)} 
denotes the set of labeled data and U = {x^+i, . . . , Xi+„} denotes the set of unlabeled data. Here x € A' 
is an instance and y S {+1,-1} is the label. We further let ysvAii'^) and ys3VM{'^) denote the predicted 
labels on x by inductive SVM and S3VM, respectively. 

3.1. S3VM-C 

The first baseUne approach is motivated by the analysis in jllu which suggests that unlabeled data help 
when the component density sets are discernable. Here, one can simulate the component density sets 
by clusters and discernibility by a condition of disagreements between S3VM and inductive SVM. We 
consider the disagreement using two factors, i.e., bias and confidence. When S3VM obtains the same bias 
as inductive SVM and enhances the confidence of inductive SVM, one should use the results of S3VM; 
otherwise it may be risky if we totally trust the prediction of S3VM. 

Algorithm [Ugives the S3VM-C method and Figure[TJd) illustrates the intuition of S3VM-C. As can be seen, 
S3VM-C inherits the correct predictions of S3VM on groups {1,4} while avoids the wrong predictions of 
S3VM on groups {7, 8, 9, 10}. 

3.2. S3VM-P 



The second baseline approach is motivated by confidence estimation in graph-based methods, e.g., BOll 
where the confidence can be naturally regarded as a risk measurement of unlabeled data. 
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Figure 1: Illustration with artificial three-moon data, (a) Labeled data (empty and filled circles) and unlabeled 
data (gray points). The blocked numbers highlight groups of four unlabeled instances. Classification results of (b) 
Inductive SVM (using labeled data only); (c) S3VM; (d) S3VM-C, where each circle presents a cluster; (e) S3 VM-p; 
(f) Our proposed S3VM-Mi. 

Formally, to estimate the confidence of unlabeled data, let F' = [{yi + l)/2, (1 — yi)/2] G {0, l}'^2 ^^ 
the label matrix for labeled data where y/ = [yi, . . . , yi]' € {±1}'^^ is the label vector. Let W = [wij] £ 
j^{i+u)x(i+u) ^g jj^g weight matrix of training data and A is the laplacian of W, i.e., A = D — W where 
D = diag{di) is a diagonal matrix with entries di = ^ w-ij. Then, the predictions of unlabeled data can 
be obtained by JBOh 

F" = A^^ W /F' 



where Au^u is the sub-matrix of A with respect to the block of unlabeled data, while W„ ^ is the sub- 
matrix of W with respect to the block between labeled and unlabeled data. Then, assign each point Xj 
with the label i/LabPoi^i) = sgn(F"_^ ^ — F"_^ 2) ^^'^ the confidence hi = |F"_^ -,^ — F^^i^l- After 
confidence estimation, similar to S3VM-C, we consider the risk of unlabeled data by two factors, i.e., bias 
and confidence. If S3VM obtains the same bias of label propagation and the confidence is high enough, 
we use the S3VM prediction, and otherwise we take SVM prediction. 



Algorithm |2] gives the SSVM-p method and Figure [TJe) illustrates the intuition of S3VM-/7. As can 
be seen, the correct predictions of S3VM on groups {2, 3} are inherited by S3VM-p, while the wrong 
predictions of S3VM on groups {7, 8, 9, 10} are avoided. 



Algorithm 1 S3VM-c 



Input: ysvM, Vssvm, 2? and parameter k 
1: Perform partitional clustering (e.g., femeans) on V. Denote Ci, . . . ,Ck as the data indices of each cluster re- 
spectively. 
2: For each cluster i — 1, . . . ,k, calculate the label bias lb and confidence cf of SVM and S3VM according to; 



lb's(3)VM = sign ^ ys(3)VM (xj 



U'eCi 



cf. 



S{3)VM 



22 yS{3)VM (Xj) 



3: If tti^svM ~ ^^S3VM ^ cfg^Yj^y[ > cfgyj^j, use the prediction of S3VM; otherwise use the prediction of 
SVM. 

Algorithm! S3YM-p 

Input: ysvM, ys3VM, ^, W and parameter rj 

1: Perform label propagation (e.g., 130(1 ) with W, obtain the predicted label yip{xi) and confidence hi for each 

unlabeled instance Xi, i = I + 1, . . . ,1 + u. 

2: Update h according to 

hi = ys3VM{^i)yip{'^i)hi, i = l + l,...,l + u. 

Let c denote the number of nonnegative entries in h. 
3: Sort h, pick up the top-min{?7u, c} values and use the predictions of S3VM for the corresponding unlabeled 
instances, otherwise use the predictions of SVM. 

4. Our Proposed Method 

4.1. Deficiencies ofS3VM-c and S3VM-^ 

S3VM-C and SSVM-;? are capable of reducing the chances of performance degeneration by using unla- 
beled data, however, they both suffer from some deficiencies. For S3VM-C, it works in a local manner 
and the relation between clusters are never considered, leading to the unexploitation of some helpful un- 
labeled instances, e.g., unlabeled instances in groups {2, 3} in Figure|2td). For S3VM-/7, as stated in ll22ll . 
the confidence estimated by label propagation approach might be incorrect if the label initialization is 
highly imbalanced, leading to the unexploitation of some useful unlabeled instances, e.g., groups {4, 5} 
in Figure Ute). 




(c) (d) 

Figure 2: Illustration with artificial two-moon data when S3VM degenerates performance, (a) Labeled data (empty 
and filled circles) and unlabeled data (gray points). The blocked number highlight a group of four unlabeled in- 
stances. Classification results of (b) S3VM-C, where each circle presents a cluster; (c) S3VM-/7; (d) Our proposed 
S3VM-MS. 

Moreover, both S3VM-C and S3YM-p heavily rely on the predictions of S3VM, which might become a 
serious issue especially when S3VM obtains degenerated performance. Figuresl^b) and^c) illustrate the 
behaviors of S3VM-C and S3VM-/7 when S3VM degenerates performance. Both S3VM-C and S3VM-/7 
erroneously inherit the wrong predictions of S3VM of group 1. 

4.2. S3VM-US 



The deficiencies of S3VM-c and S3VM-j) suggest to take into account of cluster relation and make the 



method insensitive to label initialization. This motivates us to use hierarchical clustering 11311 . leading to 
our proposed method S3VM-M5. 

Hierarchical clustering works in a greedy and iterative manner. It first initials each singe instance as a 
cluster and then at each step, it merges two clusters with the shortest distance among all pairs of clusters. 
In this step, the cluster relation is considered and moreover, since hierarchical clustering works in an 
unsupervised setting, it does not suffer from the label initialization problem. 

Suppose Pi and rij are the lengths of paths from the instance Xj to its nearest positive and negative labeled 
instances, respectively, in hierarchical clustering. We simply take the difference between pi and nj as an 
estimation of the confidence on the unlabeled instance Xj. Intuitively, the larger the difference between pi 
and rii, the higher the confidence on labeling Xj. 



Algorithms S3VM-M5 



Input: ysvM, yssvM, 'D and parameter e 
1: Let iS be a set of the unlabeled data x such that ysvAii^) 7^ WsavAf (x). 
2: Perform hierarchical clustering, e.g., single linkage method 111311 . 
3: For each unlabeled instance x^ G S, calculate pi and n^, that is, the length of the paths from x^ to its nearest 

positive and negative labeled instances, respectively. Denote ti ~ {ui — pi). 
4: Let B be the set of unlabeled instances Xi in S satisfying \ti\ > e\l + u\. 
5: If J2x eB ys3VM{^i)ti > X^x es ysvM{^i)ti, predict the unlabeled instances in B by S3VM and otherwise 

by SVM. 
6: Predict the unlabeled data x ^ ^S by SVM. 

Algorithm [3] gives the S3YM-us method and Figures [Hf) and [2] illustrate the intuition of SSYM-us. As 
can be seen, the wrong predictions of S3VM on groups {7, 8, 9, 10} are avoided by S3YM-us, the correct 
predictions of S3VM on groups {2, 3, 4, 5} are inherited, and S3'VM-us does not erroneously inherit the 
wrong predictions of S3VM on group 1 in Figure 2. 

5. Experiments 

5.1. Settings 

We evaluate S3YM-us on a broad range of data sets including the semi-supervised learning benchmark 
data sets in |6l] and sixteen UCI data sets^. The benchmark data sets are g241c, g241d, Digiti, USPS, 
TEXT and BCI. For each data, the archivqj provides two data sets with one using 10 labeled examples and 
the other using 100 labeled examples. As for UCI data sets, we randomly select 10 and 100 examples to be 
used as labeled examples, respectively, and use the remaining data as unlabeled data. The experiments are 
repeated for 30 times and the average accuracies and standard deviations are recorded. It is worth noting 
that in semi-supervised learning, labeled examples are often too few to afford a valid cross validation, and 
therefore hold-out tests are usually used for the evaluation. 

In addition to S3VM-C and S3VM-p, we compare with inductive SVM and TSVMl Ha]. Both linear and 
Gaussian kernels are used. For the benchmark data sets, we follow the setup in i^. Specifically, for the 



http://archive.ics.uci.edu/ml/ 
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case of 10 labeled examples, the parameter C for SVM is fixed to m/ X^I^Li ||xjp where ?7i = / + u is 
the size of data set and the Gaussian kernel width is set to 5, i.e., the average distance between instances. 
For the case of 100 labeled examples, C is fixed to 100 and the Gaussian kernel width is selected from 
{0.255, 0.5(5, (5, 2(5, 4(5} by cross validation. On UCI data sets, the parameter C is fixed to 1 and the 
Gaussian kernel width is set to 5 for 10 labeled examples. For 100 label examples, the parameter C is 
selected from {0.1, 1, 10, 100} and the Gaussian kernel width is selected from {0.25(5, 0.5(5, 5, 26, 46} by 
cross validation. For S3VM-C, the cluster number k is fixed to 50; for SdYM-p, the weighted matrix is 
constructed via Gaussian distance and the parameter rj is fixed to 0. 1 ; for S3VM-Mi^, the parameter e is 
fixed to 0.1. 

5.2. Results 

The results are shown in Tables [T] and [2] As can be seen, the performance of S3VM-M5 is competitive with 
TSVM. In terms of average accuracy, TSVM performs slightly better (worse) than S3VM-M5' on the case 
of 10 (100) labeled examples. In terms of pairwise comparison, SdYM-us performs better than TSVM on 
13/12 and 14/16 cases with linear/Gaussian kernel for 10 and 100 labeled examples, respectively. Note 
that in a number of cases, TSVM has large performance improvement against inductive SVM, while the 
improvement of SSYM-us is smaller. This is not a surprise since SSYM-us tries to improve performance 
with the caution of avoiding performance degeneration. 

Though TSVM has large improvement in a number of cases, it also has large performance degeneration 
in cases. Indeed, as can be seen from Tables [T] and |2l TSVM is significantly inferior to inductive SVM 
on 8/44, 19/44 cases for 10 and 100 labeled examples, respectively. Both S3VM-C and SdVM-p are 
capable to reduce the times of significant performance degeneration, while SSYM-us does not significantly 
degenerate performance in the experiments. 

5.3. Parameter Influence 

S3VM-Mi' has a parameter e. To study the influence of e, we run experiments by setting e to different 
values (0.1, 0.2 and 0.3) with 10 labeled examples. The results are plotted in Figure[3l It can be seen that 
the setting of e has influence on the improvement of S3VM-Mi' against inductive SVM. Whatever linear 
kernel or gaussian kernel is used, the larger the value of e, the closer the performance of S3VM-Mi' to 
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Table 1: Accuracy (mean ± std.) on 10 labeled examples. 'SVM' denotes inductive SVM which uses labeled data 
only. For the semi-supervised methods (TSVM, S3VM-C, S3VM-/? and S3VM-m.s), if the performance is signifi- 
cantly better/worse than SVM, the corresponding entries are bolded/underlined (paired t-tests at 95% significance 
level). The win/tie/loss counts with the fewest losses are bolded. 



Data 


SVM 


TSVM 


S3VM-C 


S3VM-P 


S3VM-M.S 




( linear / gaussian ) 


( linear / gaussian ) 


( linear / gaussian ) 


( linear / gaussian ) 


( linear / gaussian ) 


BCI 


50.7±1.5/52.7±2.7 


49.3±2.8/51.4±2.7 


50.2±2.0/52.2±2.6 


50.6±1.6/52.6±2.7 


50.9±1.6/52.6±2.7 


g241c 


53.2±4.8/53.0±4.5 


78.9±4.7 / 78.5±5.0 


55.2±8.3/55.3±8.8 


53.9±5.8 / 53.6±5.3 


53.5±4.8/53.2±4.5 


g241d 


54.4±5.4/54.5±5.2 


53.6±7.8/53.2±6.5 


53.8±5.4/53.6±5.0 


54.1±5.3/54.0±5.2 


54.4±5.3/54.4±5.2 


digiti 


55.4±10.9/75.0±7.9 


79.4±1.1/81.5±3.1 


56.1±12.2/77.3±8.2 


56.2±12.2/75.0±8.1 


58.1±9.6/75.1±7.8 


USPS 


80.0±0.1/80.7±1.8 


69.4±1.2/73.0±2.6 


80.0±0.1/80.4±2.5 


80.0±0.1/80.5±2.1 


80.0±0.1/80.7±1.8 


Text 


54.7±6.3/54.6±6.3 


71.4±11.7/71.2±11.4 


56.8±8.8 / 56.5±8.7 


55.3±6.6 / 55.2±6.8 


58.0±9.0/57.8±8.9 


house 


90.0±6.0/84.8±11.8 


84.6±8.0/84.7±6.9 


89.8±6.2/84.8±11.9 


89.5±6.0/84.5±11.8 


90.1±6.1/85.4±11.4 


heart 


58.8±10.5/63.9±11.6 


72.4±12.6 / 72.6±10.4 


59.0±10.8/64.4±11.6 


58.6±10.6/63.8±11.7 


61.9±9.7/65.1±11.0 


heart-statloi 


) 74.6±4.8/69.9±10.1 


74.9±6.6 / 73.9±5.9 


74.5±5.2/70.1±l().2 


74.5±4.9/70.0±10.2 


74.2±5.4/71.7±6.9 


ionosphere 


70.4±8.7/65.8±9.8 


72.0±1().5/76.1±8.2 


70.9±9.0 / 66.1±9.9 


70.4±8.7/66.0±9.7 


70.7±8.3/67.4±6.7 


vehicle 


73.2±8.9/58.3±9.5 


72.1±9.4/63.2±7.8 


73.5±9.4/58.4±9.6 


72.6±9.1/58.()±9.5 


74.5±9.3 / 64.2±9.1 


house-votes 


85.5±7.0/79.7±l().7 


83.8±6.1/84.0±5.3 


85.7±7.0/80.1±l().6 


85.3±6.9/79.7±10.7 


86.0±5.7/84.3±6.1 


wdbc 


65.6±7.5/73.8±10.3 


90.0±6.1 / 88.9±3.7 


65.7±7.8/74.9±10.9 


66.1±8.0/73.9±10.5 


65.8±7.5 / 73.9±10.3 


cleani 


58.2±4.2/53.5±6.2 


57.0±5.1/53.3±4.8 


57.8±4.4/53.3±6.2 


58.5±4.2/53.3±6.3 


58.2±4.2/55.0±8.1 


isolet 


93.8±4.3/82.0±15.7 


84.2±10.9/86.7±9.5 


94.5±5.1 / 83.2±16.0 


93.0±4.7/81.7±15.7 


93.7±4.3 / 84.1±12.6 


breastw 


93.9±4.8/92.3±l().l 


89.2±8.6/88.9±8.8 


94.2±4.9/92.4±l().() 


93.9±4.9/92.2±l().() 


93.6±5.4/92.4±9.9 


australian 


70.4±9.2/60.3±8.4 


69.6±11.9/68.6±11.4 


70.1±9.8/60.4±8.3 


70.5±9.4/60.5±8.8 


70.3±9.2/60.8±7.9 


diabetes 


63.3±6.9/66.3±3.5 


63.4±7.6/65.8±4.6 


63.2±6.8/65.9±3.0 


63.4±6.6/66.2±3.4 


63.3±6.9/66.3±3.5 


german 


65.2±4.9/65.1±12.0 


63.7±5.6/63.5±5.1 


65.6±4.7/65.1±11.8 


65.6±4.8/65.1±11.9 


65.2±5.0/65.3±11.6 


optdigits 


96.1±3.2/92.8±9.6 


89.8±9.2/91.4±7.6 


96.6±3.1 / 93.6±9.9 


95.6±3.0/92.4±9.8 


96.9±2.5/94.9±5.8 


ethn 


56.5±8.8/58.5±l().2 


64.2±13.5/68.1±14.5 


56.5±8.6/59.4±11.6 


56.8±9.1/58.6±10.7 


59.8±10.7/61.8±11.3 


sat 


95.8±4.1/87.5±10.9 


85.5±11.4/86.5±10.8 


96.3±4.1/87.7±11.2 


94.8±4.2/86.9±10.8 


96.4±3.9/90.7±8.1 


Aver. Ace. 


70.9 / 69.3 


73.5/73.8 


71.2/69.8 


70.9 / 69.3 


71.6/70.8 


SVM vs. S 


3mi-Supervised: W/T/L 


18/18/8 


14/29/1 


7/25/12 


12/32/0 
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Table 2: Accuracy (mean ± std.) on 100 labeled examples. 'SVM' denotes inductive SVM which uses labeled data 
only. For the semi-supervised methods (TSVM, S3VM-C, S3VM-/5 and S3VM-m.s), if the performance is signifi- 
cantly better/worse than SVM, the corresponding entries are bolded/underlined (paired t-tests at 95% significance 
level). The win/tie/loss counts with the fewest losses are bolded. 



Data 



SVM 
( linear / gaussian ) 



TSVM 
( linear / gaussian ) 



S3VM-C 
( linear / gaussian ) 



S3WM-P 
( linear / gaussian ) 



S3VM-M.V 
( linear / gaussian ) 



BCI 

g241c 

g241d 

digiti 

USPS 

Text 

house 

heart 

heart-statlo^ 

ionosphere 

vehicle 

house-vote; 

clean 1 

wdbc 

isolet 

breastw 

australian 

diabetes 

german 

optdigits 

ethn 

sat 



61.1±2.6 
76.3±2.0 
74.2±1.9 
5().3±1.2 
80.0±0.2 
73.8±3.3 
95.7±2.0 
81.5±2.5 
81.5±2.4 
87.1±1.5 
92.9±1.7 
92.3±1.3 
73.0±2.7 
95.6±().8 
99.2±0.4 
96.4±0.4 
83.8±1.6 
75.2±1.7 
67.1 ±2.4 
99.4±0.3 
91.6±1.6 
99.7±0.2 
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Figure 3: Influence of the parameter e on the improvement of S3VM-Mi- against inductive SVM. 

SVM. It may be possible to increase the performance improvement by setting a smaller e, however, this 
may increase the risk of performance degeneration. 



6. Conclusion 

In this paper we propose the S3VM-M5 method. Rather than simply predicting all unlabeled instances 
by semi-supervised learner, S3VM-m5^ uses hierarchical clustering to help select unlabeled instances to be 
predicted by semi-supervised learner and predict the remaining unlabeled instances by inductive learner. 
In this way, the risk of performance degeneration by using unlabeled data is reduced. The effectiveness of 
S3YM-US is validated by empirical study. 

The proposal in this paper is based on heuristics and theoretical analysis is future work. It is worth 
noting that, along with reducing the chance of performance degeneration, S3VM-M5 also reduces the 
possible performance gains from unlabeled data. In the future it is desirable to develop really safe semi- 
supervised learning approaches which are able to improve performance significantly but never degenerate 
performance by using unlabeled data. 
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